{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Vehicle Routing Problem"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pre-requisites \n",
    "\n",
    "### Imports\n",
    "\n",
    "To get started, we'll import the Python libraries we need, set up the environment with a few prerequisites for permissions and configurations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "import boto3\n",
    "import sys\n",
    "import os\n",
    "\n",
    "from sagemaker.rl import RLEstimator\n",
    "\n",
    "sys.path.append(\"common\")\n",
    "from misc import get_execution_role\n",
    "import docker_utils\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup S3 bucket\n",
    "\n",
    "Set up the linkage and authentication to the S3 bucket that you want to use for checkpoint and the metadata."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "S3 bucket path: s3://sagemaker-us-west-2-975155340573/\n"
     ]
    }
   ],
   "source": [
    "sage_session = sagemaker.session.Session()\n",
    "s3_bucket = sage_session.default_bucket()  \n",
    "s3_output_path = 's3://{}/'.format(s3_bucket)\n",
    "print(\"S3 bucket path: {}\".format(s3_output_path))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Configure where training happens\n",
    "You can train your RL training jobs using the SageMaker ML instance. You can choose the instance type from avaialable [ml instances](https://aws.amazon.com/sagemaker/pricing/instance-types/)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "local_mode = False\n",
    "# If on SageMaker, pick the instance type\n",
    "instance_type = \"ml.m5.large\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create an IAM role\n",
    "Either get the execution role when running from a SageMaker notebook instance `role = sagemaker.get_execution_role()` or, when running from local notebook instance, use utils method `role = get_execution_role()` to create an execution role.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using IAM role arn: arn:aws:iam::975155340573:role/service-role/AmazonSageMaker-ExecutionRole-20190502T151333\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    role = sagemaker.get_execution_role()\n",
    "except:\n",
    "    role = get_execution_role()\n",
    "\n",
    "print(\"Using IAM role arn: {}\".format(role))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define the Sagemaker-RL docker image for Ray\n",
    "\n",
    "We will use the `TensorFlow Ray` for this project. For a list of available RL images see https://github.com/aws/sagemaker-rl-container \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Building docker image rl-baseline-repo from common/Dockerfile\n",
      "$ docker build -t rl-baseline-repo -f common/Dockerfile . --build-arg BASE_IMAGE=520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-rl-tensorflow:ray0.6.5-cpu-py3\n",
      "Sending build context to Docker daemon  161.3kB\n",
      "Step 1/3 : ARG BASE_IMAGE=520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-rl-tensorflow:ray0.6.5-gpu-py3\n",
      "Step 2/3 : FROM $BASE_IMAGE\n",
      " ---> 2cb0fbd42e4f\n",
      "Step 3/3 : RUN pip install --upgrade pip && pip install --no-cache-dir xpress networkx\n",
      " ---> Using cache\n",
      " ---> bd97df84c9ea\n",
      "Successfully built bd97df84c9ea\n",
      "Successfully tagged rl-baseline-repo:latest\n",
      "Done building docker image rl-baseline-repo\n",
      "ECR repository already exists: rl-baseline-repo\n",
      "WARNING! Using --password via the CLI is insecure. Use --password-stdin.\n",
      "WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.\n",
      "Configure a credential helper to remove this warning. See\n",
      "https://docs.docker.com/engine/reference/commandline/login/#credentials-store\n",
      "\n",
      "Login Succeeded\n",
      "Logged into ECR\n",
      "$ docker tag rl-baseline-repo 975155340573.dkr.ecr.us-west-2.amazonaws.com/rl-baseline-repo\n",
      "Pushing docker image to ECR repository 975155340573.dkr.ecr.us-west-2.amazonaws.com/rl-baseline-repo\n",
      "\n",
      "$ docker push 975155340573.dkr.ecr.us-west-2.amazonaws.com/rl-baseline-repo\n",
      "The push refers to repository [975155340573.dkr.ecr.us-west-2.amazonaws.com/rl-baseline-repo]\n",
      "3e5c203c31d4: Preparing\n",
      "ddf4d03c1114: Preparing\n",
      "4ca2774d03ad: Preparing\n",
      "5f529b3bfb80: Preparing\n",
      "98e304f283f0: Preparing\n",
      "23681bc45eb1: Preparing\n",
      "6b1c8a672492: Preparing\n",
      "24a50b276aa2: Preparing\n",
      "20d31e3bf939: Preparing\n",
      "f5bfc884818e: Preparing\n",
      "e02974668a2f: Preparing\n",
      "447587c31234: Preparing\n",
      "af85d3ef13a4: Preparing\n",
      "61b79d490960: Preparing\n",
      "05862bea4cbf: Preparing\n",
      "93b18a689dde: Preparing\n",
      "c41c3ed9eaf4: Preparing\n",
      "d908c11b1a82: Preparing\n",
      "7ccfaa7554e3: Preparing\n",
      "89ec57aea3bf: Preparing\n",
      "a0c1e01578b7: Preparing\n",
      "6b1c8a672492: Waiting\n",
      "24a50b276aa2: Waiting\n",
      "20d31e3bf939: Waiting\n",
      "f5bfc884818e: Waiting\n",
      "e02974668a2f: Waiting\n",
      "447587c31234: Waiting\n",
      "af85d3ef13a4: Waiting\n",
      "61b79d490960: Waiting\n",
      "05862bea4cbf: Waiting\n",
      "93b18a689dde: Waiting\n",
      "c41c3ed9eaf4: Waiting\n",
      "d908c11b1a82: Waiting\n",
      "7ccfaa7554e3: Waiting\n",
      "89ec57aea3bf: Waiting\n",
      "a0c1e01578b7: Waiting\n",
      "23681bc45eb1: Waiting\n",
      "98e304f283f0: Layer already exists\n",
      "5f529b3bfb80: Layer already exists\n",
      "4ca2774d03ad: Layer already exists\n",
      "ddf4d03c1114: Layer already exists\n",
      "3e5c203c31d4: Layer already exists\n",
      "6b1c8a672492: Layer already exists\n",
      "23681bc45eb1: Layer already exists\n",
      "f5bfc884818e: Layer already exists\n",
      "24a50b276aa2: Layer already exists\n",
      "20d31e3bf939: Layer already exists\n",
      "e02974668a2f: Layer already exists\n",
      "447587c31234: Layer already exists\n",
      "af85d3ef13a4: Layer already exists\n",
      "05862bea4cbf: Layer already exists\n",
      "61b79d490960: Layer already exists\n",
      "93b18a689dde: Layer already exists\n",
      "d908c11b1a82: Layer already exists\n",
      "c41c3ed9eaf4: Layer already exists\n",
      "7ccfaa7554e3: Layer already exists\n",
      "89ec57aea3bf: Layer already exists\n",
      "a0c1e01578b7: Layer already exists\n",
      "latest: digest: sha256:de1d420f36a7caaa1f8b691871b187ceadb04a9f1c8ac690fda0d1e708e3b068 size: 4720\n",
      "Done pushing 975155340573.dkr.ecr.us-west-2.amazonaws.com/rl-baseline-repo\n",
      "Using ECR image 975155340573.dkr.ecr.us-west-2.amazonaws.com/rl-baseline-repo\n"
     ]
    }
   ],
   "source": [
    "cpu_or_gpu = 'gpu' if instance_type.startswith('ml.p') else 'cpu'\n",
    "aws_region = boto3.Session().region_name\n",
    "repository_name = \"rl-baseline-repo\"\n",
    "base_image_name = \"520713654638.dkr.ecr.{}.amazonaws.com/sagemaker-rl-tensorflow:ray0.6.5-{}-py3\".format(aws_region, cpu_or_gpu)\n",
    "# docker_utils._ecr_login_if_needed(base_image_name)\n",
    "custom_image_name = docker_utils.build_and_push_docker_image(repository_name, dockerfile='common/Dockerfile', build_args={\"BASE_IMAGE\":base_image_name})\n",
    "print(\"Using ECR image %s\" % custom_image_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run the Mixed Integer Programming (MIP) Baseline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define Metric\n",
    "A list of dictionaries that defines the metric(s) used to evaluate the training jobs. Each dictionary contains two keys: ‘Name’ for the name of the metric, and ‘Regex’ for the regular expression used to extract the metric from the logs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "baseline_metric_definitions = [{'Name': 'episode_reward_mean',\n",
    "  'Regex': 'episode_reward_mean: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'episode_reward_max',\n",
    "  'Regex': 'episode_reward_max: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'}]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define Estimator\n",
    "This Estimator executes the baseline generation script in a managed execution environment within a SageMaker Training Job. The managed environment is an Amazon-built Docker container that executes functions defined in the supplied entry_point Python script."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a descriptive job name\n",
    "baseline_job_name_prefix = \"baselinevrp\"\n",
    "\n",
    "train_entry_point = \"train_baseline_mip.py\"\n",
    "train_job_max_duration_in_seconds = 3600 * 24 * 2\n",
    "\n",
    "baseline_estimator = RLEstimator(entry_point= train_entry_point,\n",
    "                        source_dir=\"src\",\n",
    "                        dependencies=[\"common/sagemaker_rl\"],\n",
    "                        image_name=custom_image_name,\n",
    "                        role=role,\n",
    "                        train_instance_type=instance_type,\n",
    "                        train_instance_count=1,\n",
    "                        output_path=s3_output_path,\n",
    "                        base_job_name=baseline_job_name_prefix,\n",
    "                        metric_definitions=baseline_metric_definitions,\n",
    "                        train_max_run=train_job_max_duration_in_seconds,\n",
    "                        hyperparameters={}\n",
    "                       )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training job: baselinevrp-2019-10-22-08-38-50-645\n"
     ]
    }
   ],
   "source": [
    "baseline_estimator.fit(wait=local_mode)\n",
    "baseline_job_name = baseline_estimator.latest_training_job.job_name\n",
    "print(\"Training job: %s\" % baseline_job_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run the RL model\n",
    "\n",
    "The [RLEstimator](https://sagemaker.readthedocs.io/en/stable/sagemaker.rl.html) is used for training RL jobs. \n",
    "\n",
    "1. Specify the source directory where the gym environment and training code is uploaded.\n",
    "2. Specify the entry point as the training code \n",
    "3. Specify the choice of RL toolkit and framework. This automatically resolves to the ECR path for the RL Container. \n",
    "4. Define the training parameters such as the instance count, job name, S3 path for output and job name. \n",
    "5. Specify the hyperparameters for the RL agent algorithm. The RLCOACH_PRESET or the RLRAY_PRESET can be used to specify the RL agent algorithm you want to use. \n",
    "6. Define the metrics definitions that you are interested in capturing in your logs. These can also be visualized in CloudWatch and SageMaker Notebooks. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define Metric\n",
    "A list of dictionaries that defines the metric(s) used to evaluate the training jobs. Each dictionary contains two keys: ‘Name’ for the name of the metric, and ‘Regex’ for the regular expression used to extract the metric from the logs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "rl_metric_definitions = [{'Name': 'episode_reward_mean',\n",
    "  'Regex': 'episode_reward_mean: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'episode_reward_max',\n",
    "  'Regex': 'episode_reward_max: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'entropy',\n",
    "  'Regex': 'entropy: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'episode_reward_min',\n",
    "  'Regex': 'episode_reward_min: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'vf_loss',\n",
    "  'Regex': 'vf_loss: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'policy_loss',\n",
    "  'Regex': 'policy_loss: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},                                            \n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define Estimator\n",
    "This Estimator executes an RLEstimator script in a managed Reinforcement Learning (RL) execution environment within a SageMaker Training Job. The managed RL environment is an Amazon-built Docker container that executes functions defined in the supplied entry_point Python script."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a descriptive job name\n",
    "job_name_prefix = 'vrp-n2map55od10pnty50'\n",
    "\n",
    "train_entry_point = \"train_vehicle_routing_problem_ppo.py\"\n",
    "train_job_max_duration_in_seconds = 3600 * 24 * 2\n",
    "\n",
    "estimator = RLEstimator(entry_point= train_entry_point,\n",
    "                        source_dir=\"src\",\n",
    "                        dependencies=[\"common/sagemaker_rl\"],\n",
    "                        image_name=custom_image_name,\n",
    "                        role=role,\n",
    "                        train_instance_type=instance_type,\n",
    "                        train_instance_count=1,\n",
    "                        output_path=s3_output_path,\n",
    "                        base_job_name=job_name_prefix,\n",
    "                        metric_definitions=rl_metric_definitions,\n",
    "                        train_max_run=train_job_max_duration_in_seconds,\n",
    "                        hyperparameters={}\n",
    "                       )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training job: vrp-n2map55od10pnty50-2019-10-22-08-36-48-964\n"
     ]
    }
   ],
   "source": [
    "estimator.fit(wait=local_mode)\n",
    "rl_job_name = estimator.latest_training_job.job_name\n",
    "print(\"Training job: %s\" % rl_job_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualization\n",
    "\n",
    "RL training can take a long time.  So while it's running there are a variety of ways we can track progress of the running training job.  Some intermediate output gets saved to S3 during training, so we'll set up to capture that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "from sagemaker.analytics import TrainingJobAnalytics\n",
    "\n",
    "def visualize(job_name):\n",
    "    if not local_mode:\n",
    "        df = TrainingJobAnalytics(job_name, ['episode_reward_mean']).dataframe()\n",
    "        df_min = TrainingJobAnalytics(job_name, ['episode_reward_min']).dataframe()\n",
    "        df_max = TrainingJobAnalytics(job_name, ['episode_reward_max']).dataframe()\n",
    "        df['rl_reward_mean'] = df['value']\n",
    "        df['rl_reward_min'] = df_min['value']\n",
    "        df['rl_reward_max'] = df_max['value']\n",
    "        num_metrics = len(df)\n",
    "    \n",
    "        if num_metrics == 0:\n",
    "            print(\"No algorithm metrics found in CloudWatch\")\n",
    "        else:\n",
    "            plt = df.plot(x='timestamp', y=['rl_reward_mean'], figsize=(18,6), fontsize=18, legend=True, style='-', color=['b','r','g'])\n",
    "            plt.fill_between(df.timestamp, df.rl_reward_min, df.rl_reward_max, color='b', alpha=0.2)\n",
    "            plt.set_ylabel('Mean reward per episode', fontsize=20)\n",
    "            plt.set_xlabel('Training time (s)', fontsize=20)\n",
    "            plt.legend(loc=4, prop={'size': 20})\n",
    "    else:\n",
    "        print(\"Can't plot metrics in local mode.\")\n",
    "    \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "visualize(job_name=rl_job_name) # If this errors, it is possible that training is not yet started and no metrics are published"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "visualize(job_name=baseline_job_name) # If this errors, it is possible that training is not yet started and no metrics are published"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_tensorflow_p36",
   "language": "python",
   "name": "conda_tensorflow_p36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
